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interaction information, suffers from the problem that it is sometimes negative. Here we reconsider from first 
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with a new definition of redundancy as the minimum information that any source provides about each possible 
outcome of the variable, averaged over all possible outcomes. We then show how this measure of redundancy 
induces a lattice over sets of sources that clarifies the general structure of multivariate information. Finally, we 
use this redundancy lattice to propose a definition of partial information atoms that exhaustively decompose the 
Shannon information in a multivariate system in terms of the redundancy between synergies of subsets of the 
sources. Unlike interaction information, the atoms of our partial information decomposition are never negative 
and always support a clear interpretation as informational quantities. Our analysis also demonstrates how the 
negativity of interaction information can be explained by its confounding of redundancy and synergy. 
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I. INTRODUCTION 

From its roots in Shannon's seminal work on reliability 
and coding in communication systems, information theory 
has grown into a ubiquitous general tool for the analysis of 
complex systems, with application in neuroscience, genetics, 
physics, machine learning, and many other areas. Somewhat 
surprisingly, the vast majority of work in information theory 
concerns only the simplest possible case: the information that 
a single variable provides about another. This is quantified 
by Shannon's mutual information, which is by far the most 
widely used concept from information theory 1 1 ] . The second 
most popular concept, conditional mutual information, consid- 
ers interactions between multiple variables in only the most 
rudimentary sense: it seeks to eliminate the influence of other 
variables in order to isolate the dependency between two vari- 
ables of interest. In contrast, many of the most interesting and 
challenging scientific questions, such as many-body problems 
in physics L2J, n-person games in game theory |3|, and popula- 
tion coding in neuroscience EUS), involve understanding the 
structure of interactions between three or more variables. 

The two main attempts to generalize information theory 
to multivariate interactions are the total correlation proposed 
by Watanabe f6l (also known as the multivariate constraint 
Pi, multiinformation [8|, and integration [9]) and the inter- 
action infonnation of McGill [ 10] (also known as multiple 
mutual information [ill J . co-information |fT2|, and synergy 
ifTSl ). The total correlation, as its name suggests, measures 
the total amount of dependency between a set of variables as 
a single monolithic quantity. Thus, the total coiTelation does 
not provide any insight into how dependencies are distributed 
amongst the variables, i.e., it says nothing about the structure 
of multivariate information. 
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In contrast, interaction information was proposed as a mea- 
sure of the amount of information bound up in a set of vari- 
ables beyond that which is present in any subset of those vari- 
ables. Thus, entropy and mutual information correspond to 
first- and second-order interaction information, respectively, 
and together with its third-, fourth-, and higher-order variants, 
interaction information provides a way of characterizing the 
structure of multivariate information. Interaction information 
is also the natural generalization of mutual information when 
Shannon entropy is viewed as a signed measure on information 
diagrams lfT2l [T4l [TSl . However, the wider use of interac- 
tion information has largely been hampered by the "odd" [121 
and "unfortunate" U_5 1 property that, for three or more vari- 
ables, the interaction information can be negative (see also 
lim IT4l [T6l - [T8l ). For information as it is commonly under- 
stood, it is entirely unclear what it means for one variable to 
provide "negative information" about another. Moreover, as 
we demonstrate below, the confusing property of negativity is 
actually symptomatic of deeper problems regarding the inter- 
pretation of interaction information for larger systems. As a 
result, there remains no generally accepted extension of infor- 
mation theory for characterizing the structure of multivariate 
interactions. 

Here we formulate a new perspective on the structure of 
multivariate information. Beginning from first principles, we 
consider the general structure of the information that a set of 
sources provide about a given variable. We propose a new 
definition of redundancy as the minimum information that any 
source provides about each outcome of the variable, averaged 
over all possible outcomes. Then we show how this definition 
can be used to exhaustively decompose the Shannon informa- 
tion in a multivariate system into partial information atoms 
consisting of redundancies between synergies of subsets of the 
sources. We also demonstrate that partial information forms 
a lattice that clarifies the general structure of multivariate in- 
formation. Unlike interaction information, the atoms of our 
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FIG. 1. Structure of multivariate information for 3 variables. Labelled 
regions correspond to unique information (Unq), redundancy (Rdn), 
and synergy (Syn). 



information contributed jointly by i?i and i?2 (FIG. 1). 

In sum, for three variables we can identify unique informa- 
tion, redundancy, and synergy as the basic atoms of multivari- 
ate information. In fact, as later developments will clarify, 
unique information is best thought of as a degenerate form of 
redundancy or synergy, so that redundancy and synergy alone 
constitute the basic building blocks of multivariate informa- 
tion. In particular, we will find that various combinations of 
redundancy and synergy, which may at first sound paradoxical, 
play a fundamental role in structuring multivariate information 
in higher dimensions. Next we proceed to formalize these 
ideas, beginning with the problem of defining a measure of 
redundancy. 



III. MEASURING REDUNDANCY 



partial information decomposition are never negative and al- 
ways support a clear interpretation as informational quantities. 
Finally, our analysis also demonstrates how the negativity of 
interaction information can be explained by its confounding of 
redundant and synergistic interactions. 



II. THE STRUCTURE OF MULTIVARIATE 
INFORMATION 

Suppose we are given a random variable S and a random 
vector R = {Ri , i?2, ■ ■ ■ , Rn-i}- Then our goal is to decom- 
pose the information that R provides about S in terms of the 
partial information contributed either individually or jointly by 
various subsets of R. For example, in a neuroscience context, 
S may correspond to a stimulus that takes on different values 
and R to the evoked responses of different neurons. In this 
case, we would like to quantify the information that the joint 
neural response provides about the stimulus, and to distinguish 
between information due to responses of individual neurons 
versus combinations of them |5, 13|. 

Consider the simplest case of a system with three variables. 
How much total information does R = {i?i,i?2} provide 
about S'? How do i?i and i?2 contribute to the total informa- 
tion? The answer to the first question is given by the mutual 
information I{S; Ri , R2), while for the latter we can identify 
three distinct possibilities. First, Ri may provide information 
that i?2 does not, or vice versa (unique information). For exam- 
ple, if Ri is a copy of S and R2 is a degenerate random variable, 
then the total information from R reduces to the unique infor- 
mation from Ri. Second, Ri and R2 may provide the same 
or overlapping information (redundancy). For example, if Ri 
and i?2 are both copies of S then they redundantly provide 
complete information. Third, the combination of Ri and R2 
may provide information that is not available from either alone 
(synergy). A well-known example for binary variables is the 
exclusive-OR function S — Ri (B R2, in which case Ri and 
R2 individually provide no information but together provide 
complete information. Thus, intuitively, the total information 
from R decomposes into unique information from Ri and R2, 
redundant information shared by Ri and R2, and synergistic 



Let Ai, A2, . . . , Afc be nonempty and potentially overlap- 
ping subsets of R, which we call sources. How can we quantify 
the redundant information that all sources provide about SI 

Of course, the information supplied by each A^ is given 
simply by I{S; Ai), the mutual information between S and 
Ai. However, it is crucial to note that mutual information is 
actually a measure of average or expected information, where 
the expected value is taken over outcomes of the random vari- 
ables. Thus, for instance, two sources might provide the same 
average amount of information, while also providing infor- 
mation about different outcomes of S. Stated formally, the 
information provided by a source A can be written as 



I{S-A) = Y,p{s)I{S^s-A) 



(1) 



where the specific information I{S — s; A) quantifies the in- 
formation associated with a particular outcome s of S. Various 
definitions of specific information have been proposed to quan- 
tify different relationships between S and A (see Appendix A), 
but for our purposes the most useful is 
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The term is called the surprise of s, so I{S — s; A) is the 
average reduction in surprise of s given knowledge of A. In 
other words, I{S — s; A) quantifies the information that A 
provides about each particular outcome s G S, while I{S;A) 
is the expected value of this quantity over all outcomes of S. 

Given these considerations, a natural measure of redundancy 
is the expected value of the minimum information that any 
source provides about each outcome of S, or 
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(3) 

/,ni„ captures the idea that redundancy is the information com- 
mon to all sources (the minimum information that any source 
provides), while taking into account that sources may provide 
information about different outcomes of S. Note that, like the 
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mutual information, /,nin is also an expected value of specific 
information terms. 

/min also has several important properties that further sup- 
port its interpretation as a measure of redundancy. First, /min 
is nonnegative, a property that follows directly from the non- 
negativity of specific information (see Appendix D). Second, 
-^min is less than or equal to I{S; Ai) for all A,;'s, with equal- 
ity if and only if I{S — s;Ai) = I{S = s; Aj) for all i 
and j and all s £ S. Thus, as one would hope, the amount 
of redundant information is bounded by the information pro- 
vided by each source, with equality if and only if all sources 
provide the exact same information about S. Finally, and 
closely related to the previous property, for a given source 
A the amount of information redundant with A is maximal 
for /min(<S'; {A}) = I{S; A). In other words, redundant in- 
formation is maximized by the "self -redundancy," analogous 
to the property that mutual information is maximized by the 
self-information 7(5; S) = H{S). 

What are the distinct ways in which collections of sources 
might contribute redundant information? Formally, answering 
this question means identifying the domain of /min- Thus 
far, we have assumed that the natural domain is the collection 
of all possible sets of sources, but in fact this can be greatly 
simplified. To illustrate, consider two sources, A and B, with 
A a subset of B. Clearly, any information provided by A 
is also provided by B, so the redundancy between A and B 
reduces to the self-redundancy for A, 

/mi„(^; {A, B}) = /mi„(5; {A}) = I{S; A). 

Furthermore, for any source C, it follows that 
/mi„(5;{A,B,C}) = /min(^;{A,C}). Extending 
this idea, for any collection of sources where some are 
supersets of others, the redundancy for that collection is 
equivalent to the redundancy with all supersets removed. Thus, 
the domain for /min can be reduced to the collection of all 
sets of sources such that no source is a superset of any other. 
Formally, this set can be written as 

^(R) = {a e /'i(/'i(R)) : VA„ A, e a, A, ^ A^}, (4) 

where /'i(R) = ^(R-) \ {0} is the set of all nonempty sub- 
sets of R. Henceforth, we will denote elements of A(R), 
corresponding to collections of sources, with bracketed expres- 
sions containing only the indices for each source. For instance, 
{{/?i, /?2}} will be {12}, {{Ri}, {i?2, i?3}} will be {1}{23}, 
and so forth. 

The possibilities for redundancy are also naturally structured, 
which is shown by extending the same line of reasoning to 
define an ordering =^ on the elements of A{IVj. Consider two 
collections of sources, a, /3 G yl(R), where for each source 
B g /3 there exists a source A E a with A a subset of B. This 
means that for each source B e /3 there is a source A e a such 
that A provides no more information than B. The redundant 
information shared by all B G /3 must therefore at least include 
any redundant information shared by all A E a. Thus, we can 
define a partial order over the elements of ^(R) such that one 
element (collection of sources) is considered to precede another 
if and only if the latter provides any redundant information 



that the former provides. The ordering relation =<; is formally 
defined as 

Va, P e A{Il), (a =^ /3 4^ VB e /3, 3A e a, A C B). (5) 

Applying this ordering to the elements of ^(R) produces a 
redundancy lattice, in which a higher element provides at least 
as much redundant information as a lower one (FIG. 2; see 
Appendix C). 

The redundancy lattice provides a wealth of insight into the 
structure of redundancy. For instance, from the redundancy 
lattice it is possible to read off some of the properties of /min 
noted earlier. The property that redundancy for a source is 
maximized by the self -redundancy can be seen from the fact 
that any node corresponding to an individual source appears 
higher in the redundancy lattice than any other node involving 
that source. For example, in FIG. 2B, the node labeled {12}, 
corresponding to the self-redundancy for the source {/?i, R2}, 
occurs higher than nodes labeled {12}{13}, {12}{13}{23}, 
and {3}{ 12}. Another property of /min that can be seen from 
these diagrams relates to the top and bottom elements of the 
lattice. The top element corresponds to the self-redundancy for 
R, reflecting the fact that /min is bounded from above by the 
total amount of information provided by R. At the other end of 
the spectrum, the bottom element corresponds to the redundant 
information that each individual element of R provides, with 
all other possibilities for redundancy falling between these two 
extremes. 



IV. PARTIAL INFORMATION DECOMPOSITION 

The redundant information associated with each node of the 
redundancy lattice includes, but is not limited to, the redundant 
information provided by all nodes lower in the lattice. Thus, 
moving from node to node up the lattice, /min can be thought 
of as a kind of "cumulative information function," effectively 
integrating the information provided by increasingly inclusive 
collections of sources. Next, we derive an inverse of /min 
called the partial information function (Pl-function). Whereas 
^min quantifies cumulative information, the Pl-function mea- 
sures the partial information contributed uniquely by each 
particular collection of sources. This partial information will 
form the atoms into which we decompose the total information 
that R provides about S. 

For a collection of sources a £ -4(R), the Pl-function, 
denoted IIr, is defined implicitly by 

/min(^;a) = 5ZnR('5;/?)- (6) 

Formally, IIr corresponds to the the Mobius inverse of /min 
1 19 20 1 . From this relationship, it is clear that IIr can be 
calculated recursively as 

IIr{S; a) = /min(^; ^-^^^'^ 

Put into words, 11^(5'; a) quantifies the information provided 
redundantly by the sources of a that is not provided by any 



4 




l{12}{13>{23> 



<l>{2>(3> 



FIG. 2. Redundancy lattice for (A) 3 and (B) 4 variables. 

simpler collection of sources (i.e., any /? lower than a on the 
redundancy lattice). In Appendix D, it is shown that IIr can 
be written in closed form as 

nR(S';a) = /mi„(S';a) - ^p(s) max mJni^l'S' = s;B) 

S 

(8) 

where a represents the nodes immediately below a in the 
redundancy lattice. From this formulation, it is readily shown 
that IIr is nonnegative (see Appendix D), and thus can be 
naturally interpreted as an informational quantity associated 
with the sources of a. 

The decomposition of mutual information into a sum of 
Pl-terms follows from 

/(5;A)=/„,i„(5;{A})- ^ nR(^;/3). (9) 

For the 3-variable case R — {-Ri, R-i\, Equation (|9| yields 

i?i) = nR(5; {!}) + ^^{S-, {1}{2}) (10) 
and 

I{S- Ri,R2) = UniS; {!}) + U^iS: {2}) 

+ nR(5;{l}{2})+nR(5;{12}). (11) 

The relationship between these equations can be represented 
as a partial information (PI) diagram (FIG. 3 A), which il- 
lustrates the way in which the total information that R pro- 
vides about S is distributed across various combinations of 
sources. Furthermore, comparing this diagram with FIG. 
1 makes immediately clear the meaning of each partial in- 
formation term. First, from Equation ([8]l, we have that 
HrIS"; {1}{2}) = I^i^{S] {1}{2}), which, from the defini- 
tion of /mill, corresponds to the redundancy for Ri and i?2- 
The unique information for Ri is given by nR(S'; {!}) = 
I{S\ Ri) — /min(*S'; {1}{2}), which is the total information 
from i?i minus the redundancy, and likewise for i?2. Finally, 



the additional information provided by the combination of 
i?i and R2 is given by nR(S'; {12}), corresponding to their 
synergy. 

To fix ideas, consider the example in FIG. 4A. From the 
symmetry of the distribution, it is clear that i?i and R2 must 
provide the same amount of information about S. Indeed, this 
is easily verified, with I{S; Ri) = I{S; R2) — — | log | — 
I log I . However, it is also clear that i?i and R2 provide 
information about different outcomes of S. In particular, given 
knowledge of one can determine conclusively whether 
or not outcome S — 2 occurs (which is not the case for R2), 
and likewise for R2 and outcome S = 1. This feature is 
captured by Iln{S; {!}) = nR(S'; {2}) = |, indicating that 
Ri and R2 each provide | bits of unique information about S. 
The redundant information, nR(S'; {1}{2}) = log 3 — log 2, 
captures the fact that knowledge of either Ri or R2 reduces 
uncertainty about S from three equally likely outcomes to 
two. Finally, Ri and R2 also provide | bits of synergistic 
information, i.e., nR(S'; {12}) = ^. This value reflects the 
fact that i?i and R2 together uniquely determine whether or 
not outcome 5 = occurs, which is not true for i?i or R2 
alone. 

Note that, unlike mutual information or interaction infor- 
mation, partial information is not symmetric. For instance, 
the synergistic information that Ri and R2 provide about S is 
not in general equal to the synergistic information that S and 
R2 provide about This property is also illustrated by the 
example in FIG. 4A. Given knowledge of S, one can uniquely 
determine the outcome of i?i (and R2), so that S provides com- 
plete information about both. Thus, it is not possible for the 
combination of S and R2 to provide any additional synergistic 
information about since there is no remaining uncertainty 
about i?i when S is known. In contrast, as was just noted, Ri 
and i?2 provide 1 bits of synergistic information about S. This 
asymmetry accounts for our decision to focus on information 
about a particular variable S throughout, since in general the 
analysis will differ depending on the variable of interest. Note 
that total information is also asymmetric in this sense, i.e., in 
general I{S; Ri, R2) 7^ I{Ri \ S, R2) (though, of course, it is 
symmetric in the sense that I{S] Ri, R2) — I{Ri, R2] S)). 

The general structure of Pl-diagrams becomes clear when we 
consider the decomposition for four variables (FIG. 3B). First, 
note that all of the possibilities for three variables are again 
present for four. In particular, each element of R can provide 
unique information (regions labeled {1}, {2}, and {3}), infor- 
mation redundantly with one other variable ({I}{2}, {1}{3}, 
and {2}{3}), or information synergistically with one other vari- 
able ({12}, {13}, and {23}). Additionally, information can 
be provided redundantly by all three variables ({I}{2}{3}) 
or provided by their three-way synergy ({123}). More inter- 
esting are the new kinds of terms representing combinations 
of redundancy and synergy. For instance, the regions marked 
{1}{23}, {2}{13}, and {3}{12} represent information that is 
available redundantly from either one variable considered indi- 
vidually or the other two considered together. Or, for instance, 
the region labeled {I2}{13}{23} represents the information 
provided redundantly by the three possible two-way synergies. 
In general, the Pl-atom for a collection of sources corresponds 
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FIG. 3. Partial information diagrams for (A) 3 and (B) 4 variables. 



to the information provided redundantly by the synergies of all 
sources in the collection. This point also clarifies our earlier 
claim that unique information is best thought of as a degenerate 
case: unique information corresponds to the combination of 
first-order redundancy and first-order synergy. 

In general, a Pl-diagram for n variables, S and R = 
{Ri, i?2, • ■ • , Rn-i}, consists of the following (see Fig. S2 
in Appendix E). First, for each element Ri E R there is a 
region corresponding to I{S; Ri). Then, for every subset A 
of R with two or more elements, I{S; A) is depicted as a 
region containing I{S; A) for all ^ G A but not coextensive 
with [j^^j^I{S; A). The difference between I{S;A) and 
UyigA ^i^' ^) represents the synergy for A, the information 
gained from the combined knowledge of all elements in A 
that is not available from any subset. In addition, regions of 
the diagram intersect generically, representing all possibilities 
for redundancy. In total, a Pl-diagram is composed of the 
(n — l)-th Dedekind number ETI of Pl-atoms, same as the car- 
dinality of ^(R) (see Appendix C). As described above, each 
Pl-atom represents the redundancy of synergies for a particular 
collection of sources, corresponding to one distinct way for the 
components of R to contribute information about S. 

Finally, it is instructive to consider the relationship be- 
tween the redundancy lattice and Pl-diagram for n variables. 




K,=0 R,=\ R,=0 R,=\ Rt=0 R,=\ 



FIG. 4. Probability distributions for 5 G {0, 1,2} and Ri,R2 G 
{0, 1}. Black tiles represent equiprobable outcomes. White tiles are 
zero-probability outcomes. 



First, we note that /min is analogous to set intersection for 
Pl-diagrams, consistent with the idea of redundancy as over- 
lapping information. Specifically, /inin('5'; {Ai, A2, . . . , Afc}) 
corresponds to the region I{S; A,). From this correspon- 
dence between /mi„ and set intersection, we can establish the 
following connection: for a, /3 G -4(R), a is lower than f3 in 
the redundancy lattice if and only if PlAea ^i^' ^) ^ subset 
of nBe/3 ^("^i ™ Pl-diagram. Consequently, the redun- 
dancy lattice and Pl-diagram can be viewed as complementary 
representations of the same structure, with the Pl-diagram a col- 
lapsed version of the redundancy lattice formed by embedding 
regions according to the lattice ordering. 



V. WHY INTERACTION INFORMATION IS 
SOMETIMES NEGATIVE 

We next show how Pl-decomposition can be used to under- 
stand the conditions under which interaction information, the 
standard generalization of mutual information to multivariate 
interactions, is negative. The interaction information for three 
variables is given by 

I{S; R,;R2) = I{S; R, {R^) - I{S; R,) (12) 

and for n > 3 variables is defined recursively as 

I{S; i?2; • ■ • ; Rn-i) —I{S; -R2; • • ■ ; Rn-2\Rn-i) 

— I{S; Ri; R2; ■ ■ ■ ; Rn-2) (13) 

where the conditional interaction information is defined by 
simply including the conditioning in all terms of the original 
definition. Interaction information is symmetric for all per- 
mutations of its arguments, and is traditionally interpreted as 
the information shared by all n variables beyond that which is 
shared by any subset of those variables. 

For 3-variable interaction information, a positive value is 
naturally interpreted as indicating a situation in which any one 
variable of the system enhances the correlation between the 
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other two. For example, a positive value for Equation ( [T2] i indi- 
cates that knowledge of R2 enhances the correlation between S 
and i?i (and likewise for all other variable permutations). Thus, 
in the terminology used here, a positive value for I{S; i?i ; i?2) 
indicates the presence of synergy. On the other hand, a negative 
value for I{S; i?i ; i?2) indicates a situation in which any one 
variable accounts for or "explains away" f22] the correlation 
between the other two. In other words, a negative value for 
I{S; Ri\ R2) indicates redundancy. Indeed, I{S] R2) is a 
widely used measure of synergy in neuroscience, where it is 
interpreted in exactly this way ||23]426l . 

The Pl-decomposition for 3-variable interaction information 
(FIG. 5A; see also Fig. S3 in Appendix E) confirms this inter- 
pretation, with I{S; i?2) equal to the difference between 
the synergistic and the redundant information, i.e., 

I{S- Ri;R2) = UniS; {12}) - Iln{S; {1}{2}). (14) 

Thus, it is indeed the case that positive values indicate synergy 
and negative values indicate redundancy. 

However, Pl-decomposition also makes clear that 
I{S; Ri; R2) confounds redundancy and synergy, with the 
meaning of interaction information ambiguous for any system 
that exhibits a mixture of the two (cf. |27|, who suggest the 
possibility of mixed redundancy and synergy, but without 
attempting to disentangle them). For instance, consider again 
the example in FIG. 4A. As described earlier, in this case i?i 
and i?2 provide log 3 — log 2 bits of redundant information and 
^ bits of synergistic information. Consequently, I{S; Ri; R2) 
is negative because there is more redundancy than synergy, 
despite the fact that the system clearly exhibits synergistic 
interactions. As a second example, consider the distribution in 
FIG. 4B. In this case, Ri and i?2 provide ^ bits of redundant 
information, corresponding to the fact that knowledge of 
either i?i or i?2 reduces uncertainty about the outcomes 
S = and 5 = 2. Additionally, Ri and R2 provide | bits 
of synergistic information, reflecting the fact that Ri and 
i?2 together provide complete information about outcomes 
S — and 5 = 2, which is not true for either alone. Thus, the 
interaction information in this case is equal to zero despite 
the presence of both redundant and synergistic interactions, 
because redundancy and synergy are balanced. 

The situation is worse for four-variable interaction informa- 
tion, which is known to violate the interpretation that posi- 
tive values indicate (pure) synergy and negative values indi- 
cate (pure) redundancy |[T2l |28l . To demonstrate, consider 
the case of 3 -parity, which is the higher-order form of the 
exclusive-OR, or 2-parity, function mentioned earlier. In this 
case, we have a system of four binary random variables, 5 
and R = R2, R3}, where the eight outcomes for R are 
equiprobable and 5 = Ri (B R2 ® Rs- Intuitively, this cor- 
responds to a case of pure synergy, since the value of 5 can 
be determined only when all of the Ri are known. Indeed, 
using Eq. ( [T3] l we find that /(5; R2; R3) for this system 
is equal to +1 bit, as expected from the interpretation that 
positive values indicate synergy. However, now consider a 
second system of binary variables, this time where the two 
outcomes of 5 are equiprobable and R2, and R^ are all 
copies of 5. Clearly this corresponds to a case of pure redun- 
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FIG. 5. Pl-decomposition of interaction information for (A) 3 and (B) 
4 variables. Blue and red regions represent Pl-terms that are added 
and subtracted, respectively. The green region in (B) represents a 
Pl-term that is subtracted twice. 



dancy, since the value of 5 can be determined uniquely from 
knowledge of any Ri, but /(5; i?i ; i?2; ^3) for this system is 
again equal to +1 bit, same as the case of pure synergy. Thus, 
a completely redundant system is assigned a positive value for 
the interaction information, in clear violation of the idea that 
redundancy is indicated by negative values. Worse still, the 
4-variable interaction information fails to distinguish between 
the polar opposites of purely synergistic and purely redundant 
information. 

The Pl-decomposition for 4-variable interaction information 
(FIG. 5B; see also Fig. S4 in Appendix E) clarifies why this is 
the case. In terms of Pl-atoms, I{S; Ri; R2; R3) is given by 

nR(5;{123})+nR(5;{l}{2}{3}) 
-nK(5; {1}{23}) - nR(5; {2}{13}) ~ nR(5; {3}{12}) 
-nR(5; {12}{13}) - nR(5; {12}{23}) - nR(5; {13}{23}) 
-2xnR(5;{12}{13}{23}). (15) 

Thus, /(5; R2; R3) is equal to the sum of third-order syn- 
ergy ({123}) and third-order redundancy ({1}{2}{3}), minus 
the information provided redundantly by a first- and second- 
order synergy ({1}{23}, {2}{13}, and {3}{12}), minus the 



7 



information provided redundantly by two second-order syner- 
gies ({12}{13}, {12}{23}, and {13}{23}), and minus twice 
the information provided redundantly by all three second-order 
synergies ({12}{13}{23}). Thus, systems with pure synergy 
and pure redundancy have the same value for I{S; Ri; R2; R3) 
because 4-variable interaction information adds in the highest- 
order synergy and redundancy terms. More generally, the PI- 
decomposition for I{S; Ri; R2; R3) shows why it is difficult 
to interpret as a meaningful quantity, and as one might expect 
the story only becomes more complicated in higher dimen- 
sions. Thus, although one can readily decompose interaction 
information into a collection of partial information contribu- 
tions, and understand the conditions under which it will be 
positive or negative depending on the relative magnitudes of 
these contributions, the utility of interaction information for 
larger systems is unclear. 

VI. DISCUSSION 

The main objective of this paper has been to quantify multi- 
variate information in such a way that the structure of variable 
interactions is illuminated. This was accomplished by first 
defining a general measure of redundant information, /min, 
which satisfies a number of intuitive properties for a measure 
of redundancy. Next, it was shown that Inun induces a lattice 
structure over the set of possible information sources, referred 
to as the redundancy lattice, which characterizes the distinct 
ways that information can be distributed amongst a set of 
sources. From this lattice, a measure of partial information 
was derived that captures the unique information contributed 
by each possible combination of sources. It was then shown 
that mutual information decomposes into a sum of these partial 
information terms, so that the total information provided by 
a source is broken down into a collection of partial informa- 
tion contributions. Moreover, it was demonstrated that each of 
these terms supports clear interpretation as a particular com- 
bination of redundant and synergistic interactions between 
specific subsets of variables. Finally, we discussed the relation- 
ship between partial information decomposition and interaction 
information, the current de facto measure of multivariate inter- 
actions, and used partial information to clarify the confusing 
property that interaction information is sometimes negative. 

One obvious challenge with applying these ideas is that the 
number of partial information terms grows rapidly for larger 



systems. For instance, with 9 variables there are more than 
5 X 10^^ possibilities Ii29i . and beyond that the Dedekind num- 
bers are not even currently known. Thus, clearly an important 
direction for future work is to determine efficient ways of calcu- 
lating partial information terms for larger systems. To this end, 
the lattice structure of the terms is likely to play an essential 
role. As with any ordered data structure, the fact that the space 
of possibilities is highly organized can be readily exploited for 
efficient use. For instance, as a simple example, if Jmin is calcu- 
lated in a descending fashion over the nodes of the redundancy 
lattice and at a certain juncture has a value of zero, all of the 
terms below that node can immediately be eliminated simply 
from the monotonicity of /,„i„ (see Appendix D). Moreover, if 
the Markov property or any other constraints hold between the 
variables, many of the possible partial information terms can 
also be excluded. Finally, these considerations notwithstand- 
ing, it should also be emphasized that 3-variable interaction 
is the current state of the art, and thus even the simplest form 
of partial information decomposition can be used to address a 
number of outstanding questions. 

In physics, for example, 3-variable interactions have been 
explored in relation to the non-separability of quantum systems 
[30| and in the study of many-body correlation effects |31 1. In 
neuroscience, the concepts of synergy and redundancy for three 
variables have been examined in the context of neural coding in 
a number of theoretical and empirical investigations | 23 - 26 32) 
l33l . In genetics, multivariate dependencies arise in the analysis 
of gene-gene and gene-environment interactions in studies of 
human disease susceptibility ESl l34l [35l . Moreover, similar 
issues have also been explored in machine learning 1 22 , 27 36 1 , 
ecology II37II . quantum information theory ll38l . information 
geometry p39'l, rough set analysis |40|, and cooperative game 
theory 141]. Thus, in all of these cases, the 3-variable form of 
partial information decomposition can be applied immediately 
to illuminate the structure of multivariate dependencies, while 
the general form provides a clear way forward in the study of 
more complex systems of interactions. 
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Appendix A: Measures of Specific Information 

Measures of specific information are discussed in ll42ll in 
the context of quantifying the information that specific neural 
responses provide about a stimulus ensemble. For random 
variables S and R, representing stimuli and responses, respec- 
tively, the information that R provides about S is decomposed 
according to 

/(5;i?) = ^p(r)z,(r) (Al) 

and 

ir{r) = H{S) - H{S\r) (A2) 

where H{S) is the entropy of S and ir{r) is the response- 
specific information associated with each r E R. The response- 
specific information quantifies the change in uncertainty about 
S when response r is observed. In [42 1, it is shown that ir 
is the unique measure of specific information that satisfies 
additivity, though it is also possible for v to be negative. 

To distinguish the different role played by stimuli as opposed 
to responses, an alternative measure of specific information 
is proposed in 043 1 . The stimulus-specific information for an 
outcome s e S* is defined as 

is{s) = ^p{r\s)ir{r). (A3) 

reR 

Like the response-specific information, the weighted average of 
is{s) gives the mutual information I{S\ R). Stimulus-specific 
information quantifies the extent to which a particular stimulus 
s tends to evoke responses that are informative about the entire 
ensemble S (responses with high values for v). 

Finally, both [42 1 and [43 1 also discuss I{S = s;i?), the 
measure of specific information used here (Eq. (|2]l). In [43'|, 
I{S = s\R) is described as the reduction in surprise of a par- 
ticular stimulus s gained from each response, averaged over all 
responses associated with that stimulus. Thus, whereas is{s) 
weights each response r according to the information that it 
contributes about the entire ensemble S, I{S — s; R) quanti- 
fies only the information that R provides about the particular 
outcome S" = s. In [42^, it is proven that I{S = s; R) is the 
only measure of specific information that is strictly nonnega- 
tive. 



Appendix B: Lattice Theory Definitions 

Here we review only the basic concepts of lattice theory 
needed for supporting proofs. For a thorough treatment, see 
|44j|45|. 

Definition 1. A pair {X, ^) is a partially ordered set or poset 
if ^ is a binary relation on X that is reflexive, transitive and 
antisymmetric. 

Definition 2. Let Y Q X. Then a eY is a maximal element 
in Y if for all b E Y, a ^ b ^ a ^ b. A minimal element is 
defined dually. We denote the set of maximal elements ofY by 
Y and the set of minimal elements by 
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FIG. SI: Basic lattice-theoretic concepts. (A) Hasse diagram of the lattice C) for X = {1,2, 3}. (B) An example of a chain (blue 

nodes) and an antichain (red nodes). (C) The top T and bottom _L are shown in gray. Green nodes correspond to {1, 2, 3}^, the set of elements 
covered by {1, 2, 3}. The orange region represents ^ {1, 3}, the down-set of {1, 3}. 



Definition 3. Let {X, ^) be a poset, and let Y Q X. An 

element x ^ X is an upper bound /or Y if for ally ^ Y,y ^ x. 
A lower bound /or Y is defined dually. 

Definition 4. An element x E X is the least upper bound or 
supremum /or Y, denoted awpY, if x is an upper bound of 
Y and for all y E Y and all z E X.y ^ z implies x ^ z. 
The greatest upper bound or infimum /or Y, denoted mi Y, is 
defined dually. 

Definition 5. A poset {X, ^) is a lattice ;/ and only if for all 
x,y & X both inf {x, y} and sup{x, y} exist in X. If {X, ^) 
is a lattice, it is common to write x A y, the meet of x and y, 
and X W y, the join of x and y, for inf{a;, y} and supja:, y}, 
respectively. For Y C X, we use /\ Y and \J Y to denote the 
meet and join of all elements in Y, respectively. 

Definition 6. For a,b E X, we say that a is covered by & for 
b covers a) if a < b and ai^c<b^a~c. The set of 
elements that are covered by b is denoted by b^. 

The classic example of a lattice is the power set of a set X or- 
dered by inclusion, denoted C). Lattices are naturally 
represented by Hasse diagrams, in which nodes correspond to 
members of X and an edge exists between elements x and y if 
X covers y. FIG. SIA depicts the Hasse diagram for the lattice 
{'PiX),Q withX = {1,2,3}. 

Definition 7. // {X, ^) is a poset, Y C X is a chain if for all 

a,b E Y either a ^ b or b ^ a. Y is an antichain ifa^b only 
if a — b. 

FIG. SIB shows examples of a chain and an antichain. 

Definition 8. If there exists an element ^- E X with the prop- 
erty that _L ^ xfor all x E X, we call _L the bottom element 
of X. The top element ofX, denoted by T, is defined dually. 



Definition 9. For any x E X, we define 

X — {y E X : y ^ x} and \-X — {y E X : y < x} 

where \. x and |x are called the down-set and strict down-set 
of X, respectively. 

FIG. SIC illustrates the concepts of top and bottom elements, 
covering relations, and down-sets. 

Appendix C: ^(R) and the Redundancy Lattice 

Formally, ^(R) corresponds to the set of antichains on the 
lattice {V{R), C) (excluding the empty set). The cardinality 
of this set for |R| = ?i — 1 is given by the {n — l)-th Dedekind 
number, which for n = 2, 3, 4, ... is 1, 4, 18, 166, 7579, . . . 
(|21|, p. 273). The fact that (yl(R), ^) forms a lattice, which 
we call the redundancy lattice, is proven in ||46], where the 
corresponding lattice is denoted {A{X), =4') (see also ll47l ). 
As shown in |46 |, the meet (A) and join (V) for this lattice are 
given by 

a A /3 = aUP (A4) 
and 

a V /3 = t an t ;9 . (A5) 

Appendix D: Supporting Proofs 

Tlieorem 1. I{S — s; A) is nonnegative. 
Proof. 

I{S^s;A)^D{p{a\s) ||p(a))>0 

where D is the KuUback-Leibler distance and the last step 
follows from the information inequality (| 15 1, p. 26). □ 

Lemma 1. I{S — s; A) increases monotonically on the lattice 
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Proof. Consider A, B with A C B C R. Let C = B \ A 7^ 0. Then we have 

/(S* = s; B) - /(S- s; A) 
= > p b s log — — - > p a s log — — — - 

= P(a, c|s) log ^ y y ^N) log 

p(sMa,c) p{s)p{a) 

Y^Y^ / I M P(s,c|a) 
= > > p(a, c sjlog , I w I N 

^ ^ P(c|a) 
= ^p(a)i^(p(c|a,s) ||p(c|a))>0. 



□ 



I 

Theorem 2. /,nin mcreaie^s monotonically on the lattice Applying the principle of inclusion-exclusion (|,20J, p. 64), we 
(-4(R),=^). have 



k = l BCce" 76B 

|e|=fc 



Proof. We proceed by contradiction. Assume there exists 
q;,/3 e ^(R) with a ^ /3 and /„iin(5';/3) < /mi„(S';a). 
Then, from Eq. ([3]), there must exist B € /3 such that 
I{S = s; B) < I{S = s; A) for some outcome s e 5 and for a basic result of lattice theory that for any lattice L 

all A e a. Thus, from Lemma[l] there does not exist A e a and A C L, H.^^ i a =i (A ^) (IMJ, p. 57), so we have 
such that A C B. However, since a ^ /3 by assumption, there 

exists A e a such that A C B. □ I"" I 



Theorem 3. IIr can be stated in closed form as 



|a-| 

nR(5;a)=/,„i„(^;a)-E(-l)'"' E Imin{S; /\B). 



fc=i 



Proof For ;B C ^(R), define the set-additive function / as 

/(s)-EnR(^;/3)- 

/3GB 



|e|=fc 



|a-| 



/„u„(^;a)-E(-i)'"' E ^mi„(^;A^)- 



A;=l 



BCa- 
|B|=fe 



□ 



Lemma 2 (Maximum-minimums identity). Lef Abe a set of 
(A6) numbers. The maximum-minimums identity states that 



\A\ 



max 



^-E("i)'=-^ E 



mini? 



fc=i 



BCA 
|B|=fe 



or conversely. 



From Eq. (|6]l, it follows that /min('5'; a) = f{l a) and 

Iln{S;a) = fii a) - f{ia) 

= /(;«)-/( u 

pea- 



iinA = ^(-l)'=-i ^ maxB. 



fe=i 



|B|=fc 



Proof. It is proven in a number of introductory texts, e.g. ||48 



□ 
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Theorem 4. IIr can be stated in closed form as 

Ilfi{S;a) — IininiS;a) —} p(s) max niin/(S' = s; B). (A7) 

^ — ' Pea- BG;3 

Proof. 



Combining Eqs. ( |A6| l and ^ yields 



|a-| 

k=i bcq- s BeA6 

|e|=fc 

|a-| 

= /„,i„(5;a)- Vp(s) V(-l)'=-i V min/(5 = s;B) 

s k=l BCa- 

\B\=k 



and by Lemma[T]and Eq. ( |A4| l, 



Then, applying Lemma|2]we have 



\a-\ 

/„.i„(5;a)-Ep(^)E(-i)'"' E ^igg^J^^(^ = ^'^) 

s k=l BCa^ 

\B\=k 



Imin{S; a) — y p{s) max min/(S' = s; B) 
^ — ' pea- Bef3 



Theorem 5. IIr is nonnegative. 



□ 



Proof. If a = _L, nR,(S'; a) — Iniin{S; ct) and nR(5; a) > follows from the nonnegativity of /min- To prove it for a ^ ±, we 
proceed by contradiction. Assume there exists a € A(R) \ {-L} such that nR,(5; a) < 0. Applying Eq. (|3]l to Theorem|4]and 
combining summations yields 

IlR.fS'; a) — y p(s){min I{S = s; A) — max min/(5 — s; B)}. 

^ — ' Aea Bea- Be/3 

s 

From this equation, it is clear that there must exist /3 S a~ such that for all B € /3, I{S = s; A) < I{S = s; B) for some 
outcome s E S and some Aea. Thus, from Lemma[T[ there does not exist B G /3 such that B C A. However, since /? -< a by 
definition, there exists B G /3 such that B C A. □ 
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Appendix E: Supplementary Figures 

A 



I(S;«i) 




FIG. S2: Constructing a Pl-diagram for 4 variables. (A) For each element iij € R there is a region corresponding to I{S; Ri). (B-E) For each 
subset A of R with two or more elements, I{S] A) is depicted as a region containing I{S; yl) for all A e A but not coextensive with 
U^gA ^i^'i Regions of the diagram intersect generically, representing all possibiUties for redundancy. 



13 




I(S;Ki) I(S;«2) I(S;Ri) I(S;R2) 

I(S;i;i^2) - I(S;i?i) I(S;i^i^2) - I(S;i?i) - I(S;i?2) 



FIG. S3: Computing the Pl-decomposition for 3-variable interaction information. (A-B) Term-by-term calculation of 

I{S; Ri; R2) ~ 1[S\ R\,R-i) — I{S; Ri) — I{S; -R2). Blue and red regions represent Pl-terms that are added and subtracted, respectively. 
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I(S;«|,K2,«3)-I(S;«i,«2) 




I(S;Ri,R2,«3) - I(S;Ri,«2) - I(S;R,,K3) - 1(5;/?, .«3) 




I(S;R| ^2,R3) - I(S;R, .ffj) - I(S;R, J{,) - KSifij^j) 
+ I(S;Ri) + I(S;R2) 




I(S;R| ,K2^3) - I(S;«| ^2) - 1(S;R, ^3) 




l(S;R,,R2.R}) - I(S;/?i,R2) - I(S;Ri,R3) - I(S;R2.R3) 
+ I(S;Ri) + I(S;R2) + I(S;R3) 



FIG. S4: Computing the Pl-decomposition for 4-variable interaction information. (A-F) Term-by-term calculation of 

I{S; Ri;R2\ R3) = I{S; RuRz, R3) - I{S; RuR-i) ~ I{S; R^Ra) - I{S; i?2, i^s) + I{S; Ri) + I{S; R2) + I{S; R3). Blue and red 

regions represent Pl-terms that are added and subtracted, respectively. Green regions represent Pl-terms that are subtracted twice. 



